We will continue to visualize the RNA-Seq dataset of CD8+ T cells that do not express the transcription factor Tcf1 (encoded by Tcf7) versus cells that do (WT = wildtype). Our goal is to represent this data in various forms over the next few sessions.
Last week, we generated plots (PCA/correlation heatmap) to visualize the quality of the samples. Today, we will work on visualizing group of genes identified as differentially expressed (DEGs) between the two groups being compared in this study (Tcf7_KO versus WT).
The volcano plot is a scatterplot that visualizes both the (log2) fold change and level of significance (usually −log10[p−value]). This data visualization enables the display of a large amount of data in a concise way. Typically, only a handful of datapoints in the volcano plot are of interest, also known as hits. The hits are datapoints that surpass a threshold that is defined by the user for both the significance and fold change. Annotation of hits (with names) is used to draw attention to these most relevant or interesting datapoints.
We will be generating volcano plots using:
ggplots2: an data visualization package that is offered via the tidyverse collection
EnhancedVolcano: a package created by Kevin Blighe, Sharmila Rana, and Myles Lewis to generate publication-ready volcano plots.
Remember to first check if you require a package for this tutorial using library(). Once the package is installed there is no need to reinstall it over and over again.
#BiocManager::install('EnhancedVolcano') #if you need to install
#BiocManager::install("biomaRt") #if you need to install
library(EnhancedVolcano)
library(dplyr)
library(tidyr)
library(ggplot2)
library(genekitr)
We will generate a volcano plot with ggplot2 that will take inspiration from this article.
deBock.et.al.2020
For the volcano plot above we can see that the authors are coloring “genes of significance” by p-adjusted values only. Whereby, genes that have a p-adjusted value of less than 0.05 are colored red and those that do not meet this threshold are being colored as black. Overall, this is perfectly fine to do as the authors wrote their methods transparently and leave no confusion with how the data was plotted.
resdata <- read.csv("TCF1KOvsWT_unfiltered_matrix.csv", header = T)
head(resdata)
## Gene_id baseMean log2FoldChange lfcSE stat pvalue
## 1 ENSMUSG00000015143.16 6770.207 -2.946559 0.05835389 -50.49464 0
## 2 ENSMUSG00000015437.6 1365.097 6.213701 0.15815498 39.28868 0
## 3 ENSMUSG00000021756.13 5340.423 -1.932083 0.04966316 -38.90376 0
## 4 ENSMUSG00000025163.7 4445.134 2.453194 0.06245931 39.27668 0
## 5 ENSMUSG00000035042.3 2816.472 6.878698 0.14338097 47.97497 0
## 6 ENSMUSG00000039521.14 1506.650 6.560086 0.16962943 38.67304 0
## padj WT_CD8_rep1 WT_CD8_rep2 WT_CD8_rep3 TCF1_KO_CD8_rep1 TCF1_KO_CD8_rep2
## 1 0 12433.34924 11759.60918 11762.92037 1518.717 1477.525
## 2 0 30.97347 42.89393 34.62547 2565.046 2960.111
## 3 0 8406.79850 8492.04492 8490.81374 2237.728 2187.952
## 4 0 1421.78206 1263.94113 1434.79279 7804.848 7529.308
## 5 0 48.95806 40.03433 54.10229 5114.678 5337.308
## 6 0 32.97176 32.40875 29.21524 2819.828 3377.057
## TCF1_KO_CD8_rep3
## 1 1669.121
## 2 2556.930
## 3 2227.202
## 4 7216.133
## 5 6303.749
## 6 2748.418
str(resdata)
## 'data.frame': 19401 obs. of 13 variables:
## $ Gene_id : chr "ENSMUSG00000015143.16" "ENSMUSG00000015437.6" "ENSMUSG00000021756.13" "ENSMUSG00000025163.7" ...
## $ baseMean : num 6770 1365 5340 4445 2816 ...
## $ log2FoldChange : num -2.95 6.21 -1.93 2.45 6.88 ...
## $ lfcSE : num 0.0584 0.1582 0.0497 0.0625 0.1434 ...
## $ stat : num -50.5 39.3 -38.9 39.3 48 ...
## $ pvalue : num 0 0 0 0 0 ...
## $ padj : num 0 0 0 0 0 ...
## $ WT_CD8_rep1 : num 12433 31 8407 1422 49 ...
## $ WT_CD8_rep2 : num 11759.6 42.9 8492 1263.9 40 ...
## $ WT_CD8_rep3 : num 11762.9 34.6 8490.8 1434.8 54.1 ...
## $ TCF1_KO_CD8_rep1: num 1519 2565 2238 7805 5115 ...
## $ TCF1_KO_CD8_rep2: num 1478 2960 2188 7529 5337 ...
## $ TCF1_KO_CD8_rep3: num 1669 2557 2227 7216 6304 ...
Right now, there is not a column which indicates whether the gene is considered differentially expressed based on p-adjusted values alone. We will add this in.
threshold_padj <- resdata$padj < 0.05
length(which(threshold_padj)) ## Which values in an object are considered true and what is the length??
## [1] 4652
We now have created a logical vector. A TRUE value will correspond to genes that meet the criteria of having a padj of < 0.05 while FALSE means it is > 0.05 (or fails this criteria).
To add this vector to our results table we can use the $ notation on the left hand side of the assignment operator to create a column:
resdata$threshold <- threshold_padj ## Add vector as a column (threshold) to the resdata
str(resdata)
## 'data.frame': 19401 obs. of 14 variables:
## $ Gene_id : chr "ENSMUSG00000015143.16" "ENSMUSG00000015437.6" "ENSMUSG00000021756.13" "ENSMUSG00000025163.7" ...
## $ baseMean : num 6770 1365 5340 4445 2816 ...
## $ log2FoldChange : num -2.95 6.21 -1.93 2.45 6.88 ...
## $ lfcSE : num 0.0584 0.1582 0.0497 0.0625 0.1434 ...
## $ stat : num -50.5 39.3 -38.9 39.3 48 ...
## $ pvalue : num 0 0 0 0 0 ...
## $ padj : num 0 0 0 0 0 ...
## $ WT_CD8_rep1 : num 12433 31 8407 1422 49 ...
## $ WT_CD8_rep2 : num 11759.6 42.9 8492 1263.9 40 ...
## $ WT_CD8_rep3 : num 11762.9 34.6 8490.8 1434.8 54.1 ...
## $ TCF1_KO_CD8_rep1: num 1519 2565 2238 7805 5115 ...
## $ TCF1_KO_CD8_rep2: num 1478 2960 2188 7529 5337 ...
## $ TCF1_KO_CD8_rep3: num 1669 2557 2227 7216 6304 ...
## $ threshold : logi TRUE TRUE TRUE TRUE TRUE TRUE ...
ggplot2We will begin by using the ggplot() function to initialize the basic graph structure. The basic idea is that you can specify different parts of the plot and add them together using the + operator. These parts are often referred to as layers.
As input ggplot() requires a data frame
Let’s start:
#ggplot(resdata) #what happens?
Notice, that you will get a blank plot because ggplot2 requires the user to specify layers using the + operator.
One type of layer is called geometric objects. Examples include:
geom_point, geom_jitter for scatter plots, dot plots, etc)geom_line, for time series, trend lines, etc)geom_boxplot, for boxplots!)Any plot created with ggplot() must have at least one geom. You can add a geom to a plot using the + operator
#ggplot(resdata) +
#geom_point() # note what happens here
You will find that even though we have added a layer by specifying geom_point, we still get an error. This is because each type of geom usually has a required set of aesthetics to be set. Aesthetic mappings are set with the aes() function and can be set inside geom_point(). If we supplied aesthetics within ggplot(), they will be used as defaults for every layer. Examples of aesthetics include:
We will begin by specifying the x- and y-axis since geom_point() requires this information for a scatter plot.
ggplot(resdata) +
geom_point(aes(x = log2FoldChange,
y = -log10(padj)))
Now, let’s move to customizing the appearance of the data points by adding color. We can color the points (i.e. those which represent significant genes) based on threshold column we created earlier.
ggplot(resdata) +
geom_point(aes(x=log2FoldChange, y=-log10(padj),
colour=threshold))
Lets take a second to change the look of the non-data plotting elements, such as the grid and labels. Many of these elements can be altered within a theme() layer.
The ggplot2 theme system handles non-data plot elements such as: + axis titles and label style + plot background + presence of major or minor gridlines + legend appearance
There are a few build-in themes we can use that will change the background/foreground colors by adding it as an additional layer. Themes can be found here
I like the classic theme, so let’s add it in.
ggplot(resdata) +
geom_point(aes(x=log2FoldChange,
y=-log10(padj), colour=threshold)) +
theme_classic()
Let’s edit the title & x and y labels, including its size, text, and position. In addition, we want to remove the legend completely.
ggplot(resdata) +
geom_point(aes(x=log2FoldChange,
y=-log10(padj), colour=threshold)) +
theme_classic() +
ggtitle("TCF7 KO versus WT") +
xlab("Log2 fold change") +
ylab("-Log10(padj)") +
theme(legend.position = "none",
plot.title = element_text(size = rel(1.25), hjust = 0.5),
axis.title = element_text(size = rel(1.25)))
We are almost there but the colors are not right. We can fix this with scale_color_manual and change the values to black and red like the article.
In addition, we are adding the parameter scale_x_continuous to set the x-axis limit. This is the default scales, we are simply adjusting it so that the plot is centered. We are also using coord_cartesian function. Setting “limits” with the Cartesian coordinate system will zoom the plot but will not change the underlying data. Here we are not changing the plot but are using the “clip” argument and setting it to “no”. This will allow the ‘outlier’ points to be drawn on the plot.
ggplot(resdata) +
geom_point(aes(x=log2FoldChange,
y=-log10(padj), colour=threshold)) +
theme_classic() +
ggtitle("TCF7 KO versus WT") +
xlab("Log2 fold change") +
ylab("-Log10(padj)") +
theme(legend.position = "none",
plot.title = element_text(size = rel(1.25), hjust = 0.5),
axis.title = element_text(size = rel(1.25))) +
scale_color_manual(values = c("black", "red")) +
scale_x_continuous(limits = c(-20, 20)) +
coord_cartesian(ylim=c(0, 300), clip = "off")
This is a great way to get a global picture of what is going on, but what if wanted to label our plot with gene names from our DE list? (Notice, that the authors labeled HOXA9 on their plot.) HOXA9 is a gene symbol. However, this is where we hit a road block since our gene ids are currently listed as ENSEMBL gene ids.
head(resdata)
## Gene_id baseMean log2FoldChange lfcSE stat pvalue
## 1 ENSMUSG00000015143.16 6770.207 -2.946559 0.05835389 -50.49464 0
## 2 ENSMUSG00000015437.6 1365.097 6.213701 0.15815498 39.28868 0
## 3 ENSMUSG00000021756.13 5340.423 -1.932083 0.04966316 -38.90376 0
## 4 ENSMUSG00000025163.7 4445.134 2.453194 0.06245931 39.27668 0
## 5 ENSMUSG00000035042.3 2816.472 6.878698 0.14338097 47.97497 0
## 6 ENSMUSG00000039521.14 1506.650 6.560086 0.16962943 38.67304 0
## padj WT_CD8_rep1 WT_CD8_rep2 WT_CD8_rep3 TCF1_KO_CD8_rep1 TCF1_KO_CD8_rep2
## 1 0 12433.34924 11759.60918 11762.92037 1518.717 1477.525
## 2 0 30.97347 42.89393 34.62547 2565.046 2960.111
## 3 0 8406.79850 8492.04492 8490.81374 2237.728 2187.952
## 4 0 1421.78206 1263.94113 1434.79279 7804.848 7529.308
## 5 0 48.95806 40.03433 54.10229 5114.678 5337.308
## 6 0 32.97176 32.40875 29.21524 2819.828 3377.057
## TCF1_KO_CD8_rep3 threshold
## 1 1669.121 TRUE
## 2 2556.930 TRUE
## 3 2227.202 TRUE
## 4 7216.133 TRUE
## 5 6303.749 TRUE
## 6 2748.418 TRUE
resdata$Gene_id <- sub("\\.\\d+", "", resdata$Gene_id)
ids <- resdata$Gene_id
gene_symbol_map <- transId(ids, transTo = "symbol", org = "mouse")
The option we used above was to specify the pattern as “.”, followed by one or more digits (\d+) at the end of the string and replace with blank (’"). regex
names(resdata)[1] = "input_id"
resdata_gene <- merge(resdata, gene_symbol_map, by.x = "input_id", by.y = "input_id")
str(resdata_gene)
## 'data.frame': 17276 obs. of 15 variables:
## $ input_id : chr "ENSMUSG00000000001" "ENSMUSG00000000028" "ENSMUSG00000000037" "ENSMUSG00000000056" ...
## $ baseMean : num 3904.96 66.3 4.94 833.78 25622.4 ...
## $ log2FoldChange : num 0.0747 0.3184 -0.2252 -0.2261 0.3501 ...
## $ lfcSE : num 0.0569 0.2499 0.9064 0.0819 0.1706 ...
## $ stat : num 1.311 1.274 -0.248 -2.76 2.052 ...
## $ pvalue : num 0.18981 0.2026 0.8038 0.00578 0.0402 ...
## $ padj : num 0.3838 0.401 0.9057 0.0241 0.119 ...
## $ WT_CD8_rep1 : num 3.60e+03 5.89e+01 9.99e-01 9.47e+02 1.80e+04 ...
## $ WT_CD8_rep2 : num 4013.92 53.38 9.53 850.25 25351.27 ...
## $ WT_CD8_rep3 : num 3795.82 64.92 5.41 899.18 24233.5 ...
## $ TCF1_KO_CD8_rep1: num 4075.6 78.88 4.53 737.14 32210.41 ...
## $ TCF1_KO_CD8_rep2: num 3943.78 56.67 5.06 767.1 27000.26 ...
## $ TCF1_KO_CD8_rep3: num 3998.7 85 4.1 801.8 26937.4 ...
## $ threshold : logi FALSE FALSE FALSE TRUE FALSE FALSE ...
## $ symbol : chr "Gnai3" "Cdc45" "Scml2" "Narf" ...
#Add a column to dataframe to specify up or down regulated
resdata_gene$diffexpressed <- "Unchanged"
#if log2FC > 1 and padj < 0.05 set as upregulated
resdata_gene$diffexpressed[resdata_gene$padj < 0.05 & resdata_gene$log2FoldChange > 1] <- "Upregulated"
#if log2FC > -1 and padj < 0.05 set as downregulated
resdata_gene$diffexpressed[resdata_gene$padj < 0.05 & resdata_gene$log2FoldChange < -1] <- "Downregulated"
Figure_Volcano
ggplot(resdata_gene) +
geom_point(aes(x=log2FoldChange,
y=-log10(padj), colour=diffexpressed)) +
theme_classic() +
ggtitle("TCF7 KO versus WT") +
xlab("Log2 fold change") +
ylab("-Log10(padj)") +
theme(legend.position = "right",
plot.title = element_text(size = rel(1.25), hjust = 0.5),
axis.title = element_text(size = rel(1.25))) +
scale_color_manual(values = c("blue", "black", "red"),
labels = c("Downregulated", "Not Significant", "Upregulated")) +
scale_x_continuous(limits = c(-20, 20)) +
coord_cartesian(ylim=c(0, 300), clip = "off")
resdata_sig <- as.data.frame(resdata_gene) %>% dplyr::filter(abs(log2FoldChange) > 1 & padj < 0.05)
head(resdata_sig)
## input_id baseMean log2FoldChange lfcSE stat
## 1 ENSMUSG00000000301 36.06472 -1.245867 0.34387653 -3.623008
## 2 ENSMUSG00000000303 740.39042 4.274595 0.35738885 11.960628
## 3 ENSMUSG00000000489 96.63058 2.107325 0.23386230 9.010966
## 4 ENSMUSG00000000530 127.87156 -2.016456 0.19967045 -10.098920
## 5 ENSMUSG00000000631 1771.29476 1.183873 0.07321316 16.170217
## 6 ENSMUSG00000000817 1200.51462 6.961123 0.49820582 13.972385
## pvalue padj WT_CD8_rep1 WT_CD8_rep2 WT_CD8_rep3 TCF1_KO_CD8_rep1
## 1 2.91197e-04 1.75073e-03 44.961485 48.61312 58.43048 14.50715
## 2 5.71000e-33 3.32000e-31 92.920402 57.19191 68.16889 1068.08907
## 3 2.04000e-19 5.88000e-18 37.967476 33.36195 37.87160 132.37776
## 4 5.59000e-24 2.15000e-22 225.806568 216.37605 173.12733 56.21521
## 5 8.18000e-59 1.24000e-56 1067.085904 1046.61189 1136.14814 2619.44765
## 6 2.30000e-44 2.15000e-42 5.994865 28.59595 22.72296 2572.29940
## TCF1_KO_CD8_rep2 TCF1_KO_CD8_rep3 threshold symbol diffexpressed
## 1 25.30009 24.57602 TRUE Pemt Downregulated
## 2 1986.56336 1169.40889 TRUE Cdh1 Upregulated
## 3 148.76455 189.44014 TRUE Pdgfb Upregulated
## 4 46.55217 49.15204 TRUE Acvrl1 Downregulated
## 5 2509.76930 2248.70571 TRUE Myo18a Upregulated
## 6 2394.40087 2179.07365 TRUE Fasl Upregulated
#write.csv(resdata_sig, file="Tcf7_ko_versus_wt_DEG.csv")
# Count the number of rows where diffexpressed is "Upregulated"
upregulated_count <- nrow(resdata_sig[resdata_sig$diffexpressed == "Upregulated", ])
upregulated_count
## [1] 560
downregulated_count <- nrow(resdata_sig[resdata_sig$diffexpressed == "Downregulated", ])
downregulated_count
## [1] 412
p <- ggplot(resdata_gene) +
geom_point(aes(x=log2FoldChange,
y=-log10(padj), colour=diffexpressed)) +
theme_classic() +
ggtitle("TCF7 KO versus WT") +
xlab("Log2 fold change") +
ylab("-Log10(padj)") +
theme(legend.position = "right",
plot.title = element_text(size = rel(1.25), hjust = 0.5),
axis.title = element_text(size = rel(1.25))) +
scale_color_manual(values = c("blue", "black", "red"),
labels = c("Downregulated", "Not Significant", "Upregulated")) +
scale_x_continuous(limits = c(-20, 20)) +
coord_cartesian(ylim=c(0, 300), clip = "off")
# Add annotation
p + annotate("text", x = 15, y = 10, label = "560", colour = "red", size = 8, hjust = 0) +
annotate("text", x = -15, y = 10, label = "412", colour = "blue", size = 8, hjust = 1)
In its most basic usage, EnhancedVolcano requires: + data-frame (i.e. resdata_gene), + a column of rownames (i.e. lab =), + a column name containing log2 fold changes (i.e. x =log2FoldChange) + a column name containing nominal or adjusted pvalues (i.e. y = padj)
There are numerous arguments that can be used to customize the volcano plot with EnhancedVolcano. We will begin with the most fundamental and most important of these options which includes visualizing sections of the volcano plot based off p adjusted cutoff of < 0.05 and absolute log2 of 1. For both, vertical and horizontal lines will be drawn to show these cutoffs exactly where we want them. Note, this was done in the plot above, however were you confident where these lines were drawn?
EnhancedVolcano(resdata_gene,
lab = as.character(resdata_gene$symbol),
x = 'log2FoldChange',
y = 'padj',
title = "TCF7_KO versus WT",
FCcutoff = 1,
pCutoff = 0.05)
Now, lets convert our colors to red and black like the article. Naturally, EnhancedVolcano will color based on four quadrants. These quadrants are shown (in order):
1) NS - shown in black 2) Log2FC - shown in green 3) p-value - shown in blue 4) p-value & Log2FC - shown in red
In addition, let’s move the legend position to the right of the plot.
EnhancedVolcano(resdata_gene,
lab = as.character(resdata_gene$symbol),
x = 'log2FoldChange',
y = 'padj',
title = "TCF7_KO versus WT",
subtitle = "Differential expression",
FCcutoff = 1,
pCutoff = 0.05,
labSize = 4,
axisLabSize = 12,
col=c('black', 'black', 'black', 'red3'),
legendLabels=c('Not sig.','Log2FC','padj', 'padj & Log2FC'),
legendPosition = 'right',
legendLabSize = 16,
legendIconSize = 5.0)
There might be too many genes labeled here making the plot look cluttered. Let’s label a few select DE genes. First, lets get a complete of list of our DEGs using:
resdata_sig <- as.data.frame(resdata_gene) %>% dplyr::filter(abs(log2FoldChange) > 1 & padj < 0.05)
head(resdata_sig)
## input_id baseMean log2FoldChange lfcSE stat
## 1 ENSMUSG00000000301 36.06472 -1.245867 0.34387653 -3.623008
## 2 ENSMUSG00000000303 740.39042 4.274595 0.35738885 11.960628
## 3 ENSMUSG00000000489 96.63058 2.107325 0.23386230 9.010966
## 4 ENSMUSG00000000530 127.87156 -2.016456 0.19967045 -10.098920
## 5 ENSMUSG00000000631 1771.29476 1.183873 0.07321316 16.170217
## 6 ENSMUSG00000000817 1200.51462 6.961123 0.49820582 13.972385
## pvalue padj WT_CD8_rep1 WT_CD8_rep2 WT_CD8_rep3 TCF1_KO_CD8_rep1
## 1 2.91197e-04 1.75073e-03 44.961485 48.61312 58.43048 14.50715
## 2 5.71000e-33 3.32000e-31 92.920402 57.19191 68.16889 1068.08907
## 3 2.04000e-19 5.88000e-18 37.967476 33.36195 37.87160 132.37776
## 4 5.59000e-24 2.15000e-22 225.806568 216.37605 173.12733 56.21521
## 5 8.18000e-59 1.24000e-56 1067.085904 1046.61189 1136.14814 2619.44765
## 6 2.30000e-44 2.15000e-42 5.994865 28.59595 22.72296 2572.29940
## TCF1_KO_CD8_rep2 TCF1_KO_CD8_rep3 threshold symbol diffexpressed
## 1 25.30009 24.57602 TRUE Pemt Downregulated
## 2 1986.56336 1169.40889 TRUE Cdh1 Upregulated
## 3 148.76455 189.44014 TRUE Pdgfb Upregulated
## 4 46.55217 49.15204 TRUE Acvrl1 Downregulated
## 5 2509.76930 2248.70571 TRUE Myo18a Upregulated
## 6 2394.40087 2179.07365 TRUE Fasl Upregulated
write.csv(resdata_sig, file="Tcf7_ko_versus_wt_DEG.csv")
Looking at the supplementary data, the authors identified some genes such as ‘Cd19’, ‘Ccr7’, and ‘Gzma’ as biologically significant to their findings. In addition, this article focusing on two transcription factors ‘Tcf7’ and ‘Lef1’ so it would be a good idea to view those. I also am including ‘Actn1’ as more of positive control. It is the most signficantly downregulated gene in our dataset and I want to make sure it shows up on the correct side of the graph.
EnhancedVolcano(resdata_gene,
lab = as.character(resdata_gene$symbol),
x = 'log2FoldChange',
y = 'padj',
selectLab = c('Ccr7', 'Cd19', 'Actn1', 'Gzma', 'Ccl5', 'Tcf7', 'Lef1'),
title = "TCF7_KO versus WT",
subtitle = "Differential expression",
FCcutoff = 1,
pCutoff = 0.05,
labSize = 4,
axisLabSize = 12,
col=c('black', 'black', 'black', 'red3'),
legendLabels=c('Not sig.','Log2FC','padj', 'padj & Log2FC'),
legendPosition = 'right',
legendLabSize = 16,
legendIconSize = 5.0,
drawConnectors = TRUE,
widthConnectors = 1.0,
colConnectors = 'black')
It looks like a few genes are hard to read. We can fix this by adding a boxed label. This will make the gene name pop out and will be given a white background. I am also choosing to bold the gene name. In addition, lets get rid of the major and minor grid lines as they are taking away from the image aesthetics.
EnhancedVolcano(resdata_gene,
lab = as.character(resdata_gene$symbol),
x = 'log2FoldChange',
y = 'padj',
selectLab = c('Ccr7', 'Cd19', 'Actn1', 'Gzma', 'Ccl5', 'Tcf7', 'Lef1'),
title = "TCF7_KO versus WT",
subtitle = "Differential expression",
FCcutoff = 1,
pCutoff = 0.05,
labSize = 4,
axisLabSize = 12,
col=c('black', 'black', 'black', 'red3'),
labCol = 'black',
labFace = 'bold',
boxedLabels = TRUE,
legendLabels=c('Not sig.','Log2FC','padj', 'padj & Log2FC'),
legendPosition = 'right',
legendLabSize = 16,
legendIconSize = 5.0,
drawConnectors = TRUE,
widthConnectors = 1.0,
colConnectors = 'black',
gridlines.major = FALSE,
gridlines.minor = FALSE)
ggsave("TCF7_KO_verus_WT-Enhancedvolano.png")
If you wanted to play around with colors (up versus down regulated) below is a script to help you get started.
keyvals <- ifelse(
resdata_gene$log2FoldChange < -2, 'royalblue',
ifelse(resdata_gene$log2FoldChange > 2, 'red',
'black'))
keyvals[is.na(keyvals)] <- 'black'
names(keyvals)[keyvals == 'red'] <- 'Upregulated'
names(keyvals)[keyvals == 'black'] <- 'NA'
names(keyvals)[keyvals == 'royalblue'] <- 'Downregulated'
EnhancedVolcano(resdata_gene,
lab = as.character(resdata_gene$symbol),
x = 'log2FoldChange',
y = 'padj',
selectLab = rownames(resdata_gene)[which(names(keyvals) %in%
c('Upregulated','Downregulated'))],
xlab = bquote(~Log[1]~ 'fold change'),
colCustom = keyvals,
title = "TCF7_KO versus WT",
subtitle = "Differential expression",
FCcutoff = 1,
pCutoff = 0.05,
labSize = 4,
axisLabSize = 12,
legendLabels=c('Not sig.','Log2FC','padj', 'padj & Log2FC'),
legendPosition = 'left',
legendLabSize = 16,
legendIconSize = 5.0,
gridlines.major = FALSE,
gridlines.minor = FALSE)
sessionInfo()
## R version 4.2.0 (2022-04-22)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS High Sierra 10.13.6
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] fstcore_0.9.14 genekitr_1.2.5 tidyr_1.3.0
## [4] dplyr_1.1.2 EnhancedVolcano_1.14.0 ggrepel_0.9.4
## [7] ggplot2_3.4.4
##
## loaded via a namespace (and not attached):
## [1] shadowtext_0.1.2 fastmatch_1.1-4 systemfonts_1.0.5
## [4] plyr_1.8.9 igraph_1.4.2 lazyeval_0.2.2
## [7] splines_4.2.0 BiocParallel_1.32.6 usethis_2.2.2
## [10] GenomeInfoDb_1.34.9 urltools_1.7.3 digest_0.6.33
## [13] yulab.utils_0.0.6 htmltools_0.5.7 GOSemSim_2.22.0
## [16] viridis_0.6.4 GO.db_3.15.0 fansi_1.0.6
## [19] magrittr_2.0.3 memoise_2.0.1 remotes_2.4.2.1
## [22] openxlsx_4.2.5.2 Biostrings_2.66.0 graphlayouts_1.0.0
## [25] enrichplot_1.16.2 prettyunits_1.2.0 colorspace_2.1-0
## [28] blob_1.2.4 textshaping_0.3.7 xfun_0.41
## [31] callr_3.7.3 crayon_1.5.2 RCurl_1.98-1.13
## [34] jsonlite_1.8.8 scatterpie_0.2.1 ape_5.7-1
## [37] glue_1.6.2 polyclip_1.10-6 gtable_0.3.4
## [40] zlibbioc_1.44.0 XVector_0.38.0 pkgbuild_1.4.0
## [43] BiocGenerics_0.44.0 scales_1.2.1 DOSE_3.22.1
## [46] DBI_1.1.3 miniUI_0.1.1.1 Rcpp_1.0.11
## [49] viridisLite_0.4.2 xtable_1.8-4 progress_1.2.3
## [52] gridGraphics_0.5-1 tidytree_0.4.2 bit_4.0.5
## [55] europepmc_0.4.3 stats4_4.2.0 profvis_0.3.8
## [58] htmlwidgets_1.6.2 httr_1.4.7 fgsea_1.24.0
## [61] RColorBrewer_1.1-3 ellipsis_0.3.2 urlchecker_1.0.1
## [64] pkgconfig_2.0.3 farver_2.1.1 sass_0.4.6
## [67] utf8_1.2.4 labeling_0.4.3 ggplotify_0.1.2
## [70] tidyselect_1.2.0 rlang_1.1.2 reshape2_1.4.4
## [73] later_1.3.0 AnnotationDbi_1.60.0 munsell_0.5.0
## [76] tools_4.2.0 cachem_1.0.8 downloader_0.4
## [79] cli_3.6.2 generics_0.1.3 RSQLite_2.3.1
## [82] devtools_2.4.5 evaluate_0.23 stringr_1.5.0
## [85] fastmap_1.1.1 ragg_1.2.5 yaml_2.3.8
## [88] ggtree_3.4.4 processx_3.8.3 knitr_1.45
## [91] bit64_4.0.5 fs_1.6.3 tidygraph_1.2.3
## [94] zip_2.3.0 purrr_1.0.2 KEGGREST_1.38.0
## [97] ggraph_2.1.0 nlme_3.1-162 mime_0.12
## [100] aplot_0.2.2 DO.db_2.9 ggvenn_0.1.10
## [103] xml2_1.3.4 compiler_4.2.0 rstudioapi_0.15.0
## [106] png_0.1-8 treeio_1.20.2 tibble_3.2.1
## [109] tweenr_2.0.2 bslib_0.4.2 stringi_1.7.12
## [112] highr_0.10 ps_1.7.5 lattice_0.22-5
## [115] Matrix_1.5-4 vctrs_0.6.4 pillar_1.9.0
## [118] lifecycle_1.0.4 triebeard_0.4.1 jquerylib_0.1.4
## [121] data.table_1.14.10 cowplot_1.1.1 bitops_1.0-7
## [124] httpuv_1.6.9 patchwork_1.1.3 qvalue_2.28.0
## [127] R6_2.5.1 promises_1.2.1 gridExtra_2.3
## [130] IRanges_2.32.0 sessioninfo_1.2.2 codetools_0.2-19
## [133] pkgload_1.3.3 MASS_7.3-60 withr_2.5.2
## [136] S4Vectors_0.36.2 GenomeInfoDbData_1.2.9 parallel_4.2.0
## [139] hms_1.1.3 clusterProfiler_4.4.4 fst_0.9.8
## [142] grid_4.2.0 ggfun_0.1.3 rmarkdown_2.25
## [145] ggforce_0.4.1 Biobase_2.58.0 shiny_1.7.4
## [148] geneset_0.2.7
title: “EnhancedVolcano: publication-ready volcano plots with enhanced colouring and labeling” author: “Kevin Blighe, Sharmila Rana, Myles Lewis”
Note: if you use Enhanced Volcano in published research, please cite:
Blighe K, Rana S, Lewis M (2022). EnhancedVolcano: Publication-ready volcano plots with enhanced colouring and labeling. R package version 1.16.0, https://github.com/kevinblighe/EnhancedVolcano.